View-Action Representation Learning for Active First-Person Vision

نویسندگان

چکیده

In visual navigation, a moving agent equipped with camera is traditionally controlled by an input action and the estimation of features from sensory state (i.e. view) treated as pre-processing step to perform high-level vision tasks. this paper, we present representation learning approach that, instead, considers both inputs. We condition encoded feature transition network on that changes view camera, thus describing scene more effectively. Specifically, introduce module generates decoded higher dimensional representations increase representational power. then fuse output intermediate response predicts future state. To enhance discrimination capability among predictions different actions, further triplet ranking loss $N$ -tuplet functions, which in turn can be integrated regression loss. demonstrate proposed reinforcement imitation learning-based mapless navigation tasks, where learns navigate only through performed action, without external information.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Action Recognition using ST-patch Features for First Person Vision

Much research has been devoted in recent years to recognizing human action from video images. Most existing methods, however, take video of people from the outside making it difficult to understand behavioral intention. The First Person Vision approach has been proposed in response to this problem. In this approach, a device consisting of two cameras collectively called an “inside-out camera” i...

متن کامل

View-Invariant Representation and Learning of Human Action

Automatically understanding human actions from video sequences is a very challenging problem. This involves the extraction of relevant visual information from a video sequence, representation of that information in a suitable form, and interpretation of visual information for the purpose of recognition and learning. In this paper, we first present a view-invariant representation of action consi...

متن کامل

View-Adaptive Metric Learning for Multi-view Person Re-identification

Person re-identification is a challenging problem due to drastic variations in viewpoint, illumination and pose. Most previous works on metric learning learn a global distance metric to handle those variations. Different from them, we propose a view-adaptive metric learning (VAML) method, which adopts different metrics adaptively for different image pairs under varying views. Specifically, give...

متن کامل

Feature Space Trajectory Representation for Active Vision

A new feature space trajectory (FST) description of 3-D distorted views of an object is advanced for active vision applications. In an FST, di erent distorted object views are vertices in feature space. A new eigen-feature space and Fourier transform features are used. Vertices for di erent adjacent distorted views are connected by straight lines so that an FST is created as the viewpoint chang...

متن کامل

Visual Motif Discovery via First-Person Vision

Visual motifs are images of visual experiences that are significant and shared across many people, such as an image of an informative sign viewed by many people and that of a familiar social situation such as when interacting with a clerk at a store. The goal of this study is to discover visual motifs from a collection of first-person videos recorded by a wearable camera. To achieve this goal, ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Transactions on Circuits and Systems for Video Technology

سال: 2021

ISSN: ['1051-8215', '1558-2205']

DOI: https://doi.org/10.1109/tcsvt.2020.2987562